The Hierarchical Backbone of Complex Networks 



m 
o 
o 

(N 
o 

Q 

(N 



I 

o 
o 



> 
o 



X3 
O 

o 



X 



Luciano da Fontoura Costa 
Institute of Physics of Sao Carlos. University of Sao Paulo, 
Sao Carlos, SP, PO Box 369, 13560-970, phone +55 1 62 73 9858, 
FAX +55 162 71 3616, Brazil, lucianomfsc.usp.br 
(Dated: February 2, 2008) 

Given any complex directed networlc, a set of acyclic subgraphs — the hierarchical backbone 
of the network — can be extracted that will provide valuable information about its hierarchical 
structure. The current paper presents how the interpretation of the network weight matrix as a 
transition matrix allows the hierarchical backbone to be identified and characterized in terms of the 
concepts of hierarchical degree, which expresses the total number of virtual edges established along 
successive transitions, and of hierarchical successors, namely the number of nodes accessible from a 
specific node while moving successive hierarchical levels. The potential of the proposed approach is 
illustrated with respect to word associations and gene sequencing data. 

PACS numbers: 89.75.Fb, 02.10.Ox, 89. 75. Da, 87.80. Tq 



Although the study and characterization of complex 
networks (e.g. has often relied on simple measure- 

ments such as the average node degree, clustering coef- 
ficient and average length, such features do not provide 
direct insights about several relevant properties of the 
analyzed networks. While such limitations have been ac- 
knowledged from time to time and complementary mea- 
sures have been duly proposed in the literature, including 
the connectivity correlation jj^] and betweeness central- 
ity relatively lesser attention has been given to mea- 
surements or algorithms capable of comprehensively ex- 
pressing the hierarchical structure of complex networks, 
and only more recently attention has been focused on 
their hierarchy In- 
deed, even if such networks often involve cycles, their 
hierarchical structure can be identified and character- 
ized in terms of concepts such as the hierarchical suc- 
cessors and hierarchical degrees, herein introduced. The 
present work concentrates on directed, weighted complex 
networks (digraphs) , illustrating the potential of the sug- 
gested concepts and algorithms with respect to complex 
networks derived from word association psychophysical 
experiments and gene sequencing in zebrafish. It is ar- 
gued that the hierarchical degree density represents a 
natural extension of the classical node degree density, 
being capable of providing additional information about 
the network hierarchy and connectivity. 

Let the complex weighted directed network (or di- 
graph) F be represented in terms of its nodes k — 
1, . . . ,N, and the directed edges expressed as pairs («, j), 
with respective weights w^.j, which can be represented 
into the weight matrix as W{j, i) = Wij. Self-connections 
are not considered in this work. Let Sria) be the opera- 
tor acting elementwise over its argument a — which can 
be a scalar, vector or matrix — in such a way that to 
each resulting element is assigned the value one when- 
ever the respective element of a has absolute value larger 
than the specified threshold T; for instance, 62{x) — 
(4,-1,0,-3,2) = (1,0,0,1,0). Thus, the adjacency ma- 
trix underlying the complex network F can be expressed 



as VF = So{W). The indegree Idk = 
of node k in such a complex network corresponds to 
the total weight of incoming edges, and the outdegree 
Odk — J2iLi^{hk) to the total weight of outgoing 
edges. If the operator fi^ acts over a matrix A as given by 
Equation^] the hierarchical successors of node k reached 
along t transitions from the initial state correspond to 
the non-zero entries of the vector s{t) calculated as in 

Equation 12 where x^j^"^ has all elements zero except the 
fc— element, which is set to 1. The successors of k reached 
exactly at the transition t can be therefore determined 
by Equation 13 The number of successors, without rep- 
etitions, of node k at t, hence Uk{t), provides valuable 
information about the ramifications of the network along 
its hierarchical levels t. For instance, if the network is a 
complete branching tree with r branches at each fork, the 
root node k leads to Uk{t) = r*, i.e. a power-law. Even 
for a network containing cycles, the successors of each of 
its constituent nodes k will have a finite maximum depth 
Dk defined as the value of t such that Uk{t -|- 1) = 0. 

The nodes identified by non-zero elements of s'if^''\ to- 
gether with the edges between them, define a subgraph 
of F which is henceforth called the A;-component Ck of F, 
whose number of nodes is denoted by Nk- 
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Given any fc-component C'k of F, it is possible to obtain 
a respective acyclic graph Gk henceforth called the 
k-th hierarchical component of F, by using the following 
algorithm: 

Include all nodes reachable from k after one transition 
into the set S, and store the respective outgoing edges 
of k into the new adjacency matrix W^; 
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Remove all incoming edges to fc; 
While S is not empty 

Remove all edges between the elements in S; 
For each element p of S*, in any order: 

Include all nodes reachable from p after one 
transition into the set R, incorporating 
the outgoing edges of p into the new 
adjacency matrix W^; 
Remove all incoming edges to p; 
Bo S = R; 




FIG. 1: A simple graph illustrating the existence of a virtual 
edge between nodes 1 and 3, defined at t = 2 by the fact that 
W^{3,1) =8. 



The weight matrix of the acyclic graph Gk is now given 
as — Wq.*W, where is the elementwise product 
between two matrices. The set of such acyclic subgraphs 
Gk, k — 1, . . . ,N is henceforth understood as the hier- 
archical backbone of the complex network F. Since the 
network F produces N hierarchical components, some cri- 
terion can be used to identify those most significant, such 
as the largest number of nodes. It is now possible to de- 
fine the hierarchical outdegree of each node k of Gk at 
the transition t, hence as the sum of the weights 
of the virtual edges established between k and any other 
nodes of Gk exactly at that instant. As the existence of 
a virtual link from node k to j is understood to occur 
whenever VF*(j, fc) ^ 0, as illustrated in Figure ^ the 
hierarchical degree of k in Gk can therefore be obtained 
from Equation 21 The cumulative hierarchical degree of 
node k after ^-transitions, hence H^k \ is given by Equa- 
tion It is suggested in this paper that the hierarchical 
degree, as well as its cumulative version, provide a nat- 
ural and powerful extension of the traditional concept of 
node degree (which coincides with hk{t = 1)), capable of 
providing more comprehensive information about the hi- 
erarchical and connectivity properties of the graph topol- 
ogy. Observe that the number of nodes in the subgraph 
Gk can be calculated as in Equation Figure |2] shows 
a simple network (a) and its hierarchical components Gi 
and GiQ. 




(b) (c) 



FIG. 2: A simple network (a) and its hierarchical compo- 
nents for fe = 1 (b) and 10 (c). The dominant hierarchical 
component, containing 9 nodes, is shown in (b). We have 
Di = 4, Dw = 2, iVi = 8, iVio = 2, ' = 2, 2, 3, 1, u<o = 1. 1 
/if' = 4, 14, 30, 48 and h^^^ = 1, 3. 
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The hierarchical degree of a node for a specific t can 
also be understood as the traditional degree of that node 
in an augmented network including all virtual connec- 
tions established along the successive interactions. The 
adjacency matrix of such an enlarged network can be cal- 
culated as = (VFf )*■ 

In order to illustrate the potential of the above pro- 
posed concepts and algorithms for hierarchical charac- 
terization of complex networks, they have been applied 



to experimental data for word associations [l5j and ze- 
brafish gene sequencing. In the word associations experi- 
ment, a graph was obtained for a human subject through 
a psychophysical experiment as described in . Start- 
ing with the word sun, the subject was requested to en- 
ter the first word that came to his mind after reading 
the program-supplied word. Except for the first word, 
all other words presented by the program are drawn 
from those previously supplied by the subject. Statistical 
methods are applied so as to ensure that every considered 
word is presented a similar number of times. A typical 
sequence obtained by this experiment is: sun i-^ round; 
round i-^ circle; sun i-^ gold; gold i-^ yellow; . . .. 

Once a large number of such associations are obtained 
(1930 in the considered experiment), a weighted directed 
graph is determined by representing each of the words 
(250 for this experiment) as nodes and the number of 
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specific associations between two words as an edge be- 
tween the respective nodes, weighted by the number of 
associations. Provided the outgoing weights are normal- 
ized, this type of graph can be understood as a Markov 
chain. It is suggested that this graph provides interesting 
information about the tendencies of the subject to asso- 
ciate words and concepts, paving the way to a series of 
possibihties such as the identification, from the respec- 
tive network topological properties, of the author of the 
associations. As the node indegrees in such a graph were 
found previously to follow a power law, the trans- 
posed weight matrix is also considered in this work. In 
addition, as the topology of a network can be severely 
modified by addition of an incorrect edge (e.g. an even- 
tual mistake while making the associations), the graph 
obtained from that experiment was thresholded such that 
only edges with weight larger than one were considered, 
yielding the weight matrix W . The maximum total clus- 
ter size, i.e. 69 was obtained for the initial word, i.e. 
sun, whose derived hierarchical structure is shown in Fig- 
ure Ola) , which is not the same as the word sky corre- 
sponding to the maximum traditional degree. Such re- 
sults reflect the fact that, by appearing soon at the begin- 
ning of the experiment, this word implied a larger number 
of indirectly related words along the rest of the experi- 
ment, even though the words were presented with similar 
frequencies. A maximum cluster size of 154 was obtained 
for the inverted associations, corresponding to the word 
line, as illustrated in Figure |3Ib). Interestingly, despite 
such a trend toward hierarchy, the obtained graph also 
incorporate several CTcles defined between different hier- 
archical levels (see |l5|). which were duly eliminated in 
the present investigation. The plots of relative frequen- 
cies showing the hierarchical depths and the cluster sizes 
are shown in Figure ^ The dilog plot of the densities 
obtained for the number of hierarchical successors and 
hierarchical degree considering all nodes in the backbone 
and i < 24 are shown in Figure |3{a-b) . 

The other experimental data considered in this pa- 
per regards zebrafish [Brachydanio rerio) gene sequenc- 
ing data, obtained from the NIH Zebrafish Gene Collec- 
tion repository |0 (file dr_mgc_cds_aa. f asta). The 
raw data consisted of sequences of aminoacid for the first 
892 genes in that file, which were organized into sub- 
sequent pairs and the number of successive occurrences 
of such pairs were calculated and used to build a com- 
plex weighted network, which included 400 nodes (20^ 
aminoacids) and 112582 sequential associations. The 
weight matrix was thresholded at 5000. The obtained 
depth and cluster size densities are shown in Figure ^a) 
and (b). The maximum cluster was obtained for 147 
nodes, which formed a densely connected cluster. The 
node leading to the maximum cumulative hierarchical 
degree i?^^-* corresponded to the aminoacid pair QK, i.e. 
glycine and lysine. 

It is clear from the depth density plot in Figure 0f a) 
that the direct and inverted word associations led to sim- 
ilar profiles characterized by wide distribution of depths. 
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FIG. 3: The first three levels of the hierarchy obtained for 
word associations starting at the node defining the largest 
cluster size (a); and the last two hierarchical levels converging 
to the word line, obtained from the inverted associations. 
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FIG. 4: Densities of: (a) depths D^, and (b) cluster sizes 
Nk, for the three considered cases: filled diamond = word 
association; diamond = inverse word association; and x = 
gene sequences. 



while the gene sequencing data implied a small number of 
depth values, with a peak at 4. This fact is explained by 
the similar weights for the aminoacid pair associations, 
which produced an abrupt transition of the connectivity 
of that graph as the threshold was raised. The higher 
density of shallow nodes (i.e. with depth between and 
4) observed for the inverse word associations indicates the 
fact that shorter streams of associations were established 
with the words entered by the subject at the last stages of 
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FIG. 5: Dilog densities for hierarchical successors (a) and 
hierarchical degrees (b) for the inverse word association data. 

the experiment. Interestingly, several words were charac- 
terized by long streams of associations. The size density 
plot in Figure 0fb) shows an interesting high density of 



cluster sizes between 50 and 60. This fact is possibly ex- 
plained by the fact that one of such clusters acts as an 
attractor to several of the network nodes. A large clus- 
ter was obtained for the inverse word associations, which 
reflects the asymetric nature of the underlying graph. A 
large peak was observed for the gene sequences which, 
considered jointly with Figure0Jb), corroborates the fact 
that most nodes in this case are accessible through a few 
hierarchical levels. It is observed from Figure |S1 (a) that 
the hierarchical degree tends to produce peaks at spe- 
cific hierarchical levels t, indicating the presence of bot- 
tlenecks along the network. While the number of small 
valued hierarchical successors tend to decrease with t, 
the opposite behavior is observed for the larger numbers. 
The hierarchical degree shown in Figure |3Jb) presents a 
rich structure indicating that this measure tends to vary 
considerably with t. 

In summary, it has been shown that the hierarchi- 
cal underlying structure of complex networks provide 
valuable information about its topology that can not 
be fully appreciated by considering traditional measure- 
ments. The proposed decomposition of the network into 
maximal hierarchical components, as well as the concepts 
of hierarchical successors and hierarchical degree, allowed 
a more comprehensive characterization of the hierarchi- 
cal structure of complex networks. It is expected that 
such tools will complement the characterization of com- 
plex networks possibly incorporating strong hierarchical 
backbone, such as the internet, metabolic networks, lin- 
guistic and social and economical systems. 
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